Significance and Recovery of Block Structures in Binary Matrices with Noise
نویسندگان
چکیده
Frequent itemset mining (FIM) is one of the core problems in the field of Data Mining and occupies a central place in its literature. One equivalent form of FIM can be stated as follows: given a rectangular data matrix with binary entries, find every submatrix of 1s having a minimum number of columns. This paper presents a theoretical analysis of several statistical questions related to this problem when noise is present. We begin by establishing several results concerning the extremal behavior of submatrices of ones in a binary matrix with random entries. These results provide simple significance bounds for the output of FIM algorithms. We then consider the noise sensitivity of FIM algorithms under a simple binary additive noise model, and show that, even at small noise levels, large blocks of 1s leave behind fragments of only logarithmic size. Thus such blocks cannot be directly recovered by FIM algorithms, which search for submatrices of all 1s. On the positive side, we show how, in the presence of noise, an error-tolerant criterion can recover a square submatrix of 1s against a background of 0s, even when the size of the target submatrix is very small.
منابع مشابه
Free Vibration Analysis of Repetitive Structures using Decomposition, and Divide-Conquer Methods
This paper consists of three sections. In the first section an efficient method is used for decomposition of the canonical matrices associated with repetitive structures. to this end, cylindrical coordinate system, as well as a special numbering scheme were employed. In the second section, divide and conquer method have been used for eigensolution of these structures, where the matrices are in ...
متن کاملLightweight 4x4 MDS Matrices for Hardware-Oriented Cryptographic Primitives
Linear diffusion layer is an important part of lightweight block ciphers and hash functions. This paper presents an efficient class of lightweight 4x4 MDS matrices such that the implementation cost of them and their corresponding inverses are equal. The main target of the paper is hardware oriented cryptographic primitives and the implementation cost is measured in terms of the required number ...
متن کاملSolution of Nonlinear Fredholm-Volterra Integral Equations via Block-Pulse Functions
In this paper, a new simple direct method to solve nonlinear Fredholm-Volterra integral equations is presented. By using Block-pulse (BP) functions, their operational matrices and Taylor expansion a nonlinear Fredholm-Volterra integral equation converts to a nonlinear system. Some numerical examples illustrate accuracy and reliability of our solutions. Also, effect of noise shows our solutions ...
متن کاملOptimal Estimation and Completion of Matrices with Biclustering Structures
Biclustering structures in data matrices were first formalized in a seminal paper by John Hartigan [15] where one seeks to cluster cases and variables simultaneously. Such structures are also prevalent in block modeling of networks. In this paper, we develop a theory for the estimation and completion of matrices with biclustering structures, where the data is a partially observed and noise cont...
متن کاملSome Optimal Codes From Designs
The binary and ternary codes spanned by the rows of the point by block incidence matrices of some 2-designs and their complementary and orthogonal designs are studied. A new method is also introduced to study optimal codes.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006